R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ stringr 1.4.0
## ✓ tidyr   1.0.2     ✓ forcats 0.5.0
## ✓ readr   1.3.1
## ── Conflicts ──────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(missForest)
## Loading required package: randomForest
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## The following object is masked from 'package:dplyr':
## 
##     combine
## Loading required package: foreach
## 
## Attaching package: 'foreach'
## The following objects are masked from 'package:purrr':
## 
##     accumulate, when
## Loading required package: itertools
## Loading required package: iterators
library(mice)
## 
## Attaching package: 'mice'
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
library(arm)
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## Loading required package: lme4
## 
## arm (Version 1.10-1, built: 2018-4-12)
## Working directory is /Users/Yen/Documents/master degree/MIS/Spring 2020/STATISTICAL AND PREDICTIVE ANALYTICS/IS 6489 Group Project
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(moments)

Including Plots

You can also embed plots, for example:

##        Id           MSSubClass       MSZoning     LotFrontage    
##  Min.   :   1.0   Min.   : 20.0   C (all):  10   Min.   : 21.00  
##  1st Qu.: 365.8   1st Qu.: 20.0   FV     :  65   1st Qu.: 59.00  
##  Median : 730.5   Median : 50.0   RH     :  16   Median : 69.00  
##  Mean   : 730.5   Mean   : 56.9   RL     :1151   Mean   : 70.05  
##  3rd Qu.:1095.2   3rd Qu.: 70.0   RM     : 218   3rd Qu.: 80.00  
##  Max.   :1460.0   Max.   :190.0                  Max.   :313.00  
##                                                  NA's   :259     
##     LotArea        Street      Alley      LotShape  LandContour  Utilities   
##  Min.   :  1300   Grvl:   6   Grvl:  50   IR1:484   Bnk:  63    AllPub:1459  
##  1st Qu.:  7554   Pave:1454   Pave:  41   IR2: 41   HLS:  50    NoSeWa:   1  
##  Median :  9478               NA's:1369   IR3: 10   Low:  36                 
##  Mean   : 10517                           Reg:925   Lvl:1311                 
##  3rd Qu.: 11602                                                              
##  Max.   :215245                                                              
##                                                                              
##    LotConfig    LandSlope   Neighborhood   Condition1     Condition2  
##  Corner : 263   Gtl:1382   NAmes  :225   Norm   :1260   Norm   :1445  
##  CulDSac:  94   Mod:  65   CollgCr:150   Feedr  :  81   Feedr  :   6  
##  FR2    :  47   Sev:  13   OldTown:113   Artery :  48   Artery :   2  
##  FR3    :   4              Edwards:100   RRAn   :  26   PosN   :   2  
##  Inside :1052              Somerst: 86   PosN   :  19   RRNn   :   2  
##                            Gilbert: 79   RRAe   :  11   PosA   :   1  
##                            (Other):707   (Other):  15   (Other):   2  
##    BldgType      HouseStyle   OverallQual      OverallCond      YearBuilt   
##  1Fam  :1220   1Story :726   Min.   : 1.000   Min.   :1.000   Min.   :1872  
##  2fmCon:  31   2Story :445   1st Qu.: 5.000   1st Qu.:5.000   1st Qu.:1954  
##  Duplex:  52   1.5Fin :154   Median : 6.000   Median :5.000   Median :1973  
##  Twnhs :  43   SLvl   : 65   Mean   : 6.099   Mean   :5.575   Mean   :1971  
##  TwnhsE: 114   SFoyer : 37   3rd Qu.: 7.000   3rd Qu.:6.000   3rd Qu.:2000  
##                1.5Unf : 14   Max.   :10.000   Max.   :9.000   Max.   :2010  
##                (Other): 19                                                  
##   YearRemodAdd    RoofStyle       RoofMatl     Exterior1st   Exterior2nd 
##  Min.   :1950   Flat   :  13   CompShg:1434   VinylSd:515   VinylSd:504  
##  1st Qu.:1967   Gable  :1141   Tar&Grv:  11   HdBoard:222   MetalSd:214  
##  Median :1994   Gambrel:  11   WdShngl:   6   MetalSd:220   HdBoard:207  
##  Mean   :1985   Hip    : 286   WdShake:   5   Wd Sdng:206   Wd Sdng:197  
##  3rd Qu.:2004   Mansard:   7   ClyTile:   1   Plywood:108   Plywood:142  
##  Max.   :2010   Shed   :   2   Membran:   1   CemntBd: 61   CmentBd: 60  
##                                (Other):   2   (Other):128   (Other):136  
##    MasVnrType    MasVnrArea     ExterQual ExterCond  Foundation  BsmtQual  
##  BrkCmn : 15   Min.   :   0.0   Ex: 52    Ex:   3   BrkTil:146   Ex  :121  
##  BrkFace:445   1st Qu.:   0.0   Fa: 14    Fa:  28   CBlock:634   Fa  : 35  
##  None   :864   Median :   0.0   Gd:488    Gd: 146   PConc :647   Gd  :618  
##  Stone  :128   Mean   : 103.7   TA:906    Po:   1   Slab  : 24   TA  :649  
##  NA's   :  8   3rd Qu.: 166.0             TA:1282   Stone :  6   NA's: 37  
##                Max.   :1600.0                       Wood  :  3             
##                NA's   :8                                                   
##  BsmtCond    BsmtExposure BsmtFinType1   BsmtFinSF1     BsmtFinType2
##  Fa  :  45   Av  :221     ALQ :220     Min.   :   0.0   ALQ :  19   
##  Gd  :  65   Gd  :134     BLQ :148     1st Qu.:   0.0   BLQ :  33   
##  Po  :   2   Mn  :114     GLQ :418     Median : 383.5   GLQ :  14   
##  TA  :1311   No  :953     LwQ : 74     Mean   : 443.6   LwQ :  46   
##  NA's:  37   NA's: 38     Rec :133     3rd Qu.: 712.2   Rec :  54   
##                           Unf :430     Max.   :5644.0   Unf :1256   
##                           NA's: 37                      NA's:  38   
##    BsmtFinSF2        BsmtUnfSF       TotalBsmtSF      Heating     HeatingQC
##  Min.   :   0.00   Min.   :   0.0   Min.   :   0.0   Floor:   1   Ex:741   
##  1st Qu.:   0.00   1st Qu.: 223.0   1st Qu.: 795.8   GasA :1428   Fa: 49   
##  Median :   0.00   Median : 477.5   Median : 991.5   GasW :  18   Gd:241   
##  Mean   :  46.55   Mean   : 567.2   Mean   :1057.4   Grav :   7   Po:  1   
##  3rd Qu.:   0.00   3rd Qu.: 808.0   3rd Qu.:1298.2   OthW :   2   TA:428   
##  Max.   :1474.00   Max.   :2336.0   Max.   :6110.0   Wall :   4            
##                                                                            
##  CentralAir Electrical     X1stFlrSF      X2ndFlrSF     LowQualFinSF    
##  N:  95     FuseA:  94   Min.   : 334   Min.   :   0   Min.   :  0.000  
##  Y:1365     FuseF:  27   1st Qu.: 882   1st Qu.:   0   1st Qu.:  0.000  
##             FuseP:   3   Median :1087   Median :   0   Median :  0.000  
##             Mix  :   1   Mean   :1163   Mean   : 347   Mean   :  5.845  
##             SBrkr:1334   3rd Qu.:1391   3rd Qu.: 728   3rd Qu.:  0.000  
##             NA's :   1   Max.   :4692   Max.   :2065   Max.   :572.000  
##                                                                         
##    GrLivArea     BsmtFullBath     BsmtHalfBath        FullBath    
##  Min.   : 334   Min.   :0.0000   Min.   :0.00000   Min.   :0.000  
##  1st Qu.:1130   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:1.000  
##  Median :1464   Median :0.0000   Median :0.00000   Median :2.000  
##  Mean   :1515   Mean   :0.4253   Mean   :0.05753   Mean   :1.565  
##  3rd Qu.:1777   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:2.000  
##  Max.   :5642   Max.   :3.0000   Max.   :2.00000   Max.   :3.000  
##                                                                   
##     HalfBath       BedroomAbvGr    KitchenAbvGr   KitchenQual  TotRmsAbvGrd   
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Ex:100      Min.   : 2.000  
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:1.000   Fa: 39      1st Qu.: 5.000  
##  Median :0.0000   Median :3.000   Median :1.000   Gd:586      Median : 6.000  
##  Mean   :0.3829   Mean   :2.866   Mean   :1.047   TA:735      Mean   : 6.518  
##  3rd Qu.:1.0000   3rd Qu.:3.000   3rd Qu.:1.000               3rd Qu.: 7.000  
##  Max.   :2.0000   Max.   :8.000   Max.   :3.000               Max.   :14.000  
##                                                                               
##  Functional    Fireplaces    FireplaceQu   GarageType   GarageYrBlt  
##  Maj1:  14   Min.   :0.000   Ex  : 24    2Types :  6   Min.   :1900  
##  Maj2:   5   1st Qu.:0.000   Fa  : 33    Attchd :870   1st Qu.:1961  
##  Min1:  31   Median :1.000   Gd  :380    Basment: 19   Median :1980  
##  Min2:  34   Mean   :0.613   Po  : 20    BuiltIn: 88   Mean   :1979  
##  Mod :  15   3rd Qu.:1.000   TA  :313    CarPort:  9   3rd Qu.:2002  
##  Sev :   1   Max.   :3.000   NA's:690    Detchd :387   Max.   :2010  
##  Typ :1360                               NA's   : 81   NA's   :81    
##  GarageFinish   GarageCars      GarageArea     GarageQual  GarageCond 
##  Fin :352     Min.   :0.000   Min.   :   0.0   Ex  :   3   Ex  :   2  
##  RFn :422     1st Qu.:1.000   1st Qu.: 334.5   Fa  :  48   Fa  :  35  
##  Unf :605     Median :2.000   Median : 480.0   Gd  :  14   Gd  :   9  
##  NA's: 81     Mean   :1.767   Mean   : 473.0   Po  :   3   Po  :   7  
##               3rd Qu.:2.000   3rd Qu.: 576.0   TA  :1311   TA  :1326  
##               Max.   :4.000   Max.   :1418.0   NA's:  81   NA's:  81  
##                                                                       
##  PavedDrive   WoodDeckSF      OpenPorchSF     EnclosedPorch      X3SsnPorch    
##  N:  90     Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  P:  30     1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00  
##  Y:1340     Median :  0.00   Median : 25.00   Median :  0.00   Median :  0.00  
##             Mean   : 94.24   Mean   : 46.66   Mean   : 21.95   Mean   :  3.41  
##             3rd Qu.:168.00   3rd Qu.: 68.00   3rd Qu.:  0.00   3rd Qu.:  0.00  
##             Max.   :857.00   Max.   :547.00   Max.   :552.00   Max.   :508.00  
##                                                                                
##   ScreenPorch        PoolArea        PoolQC       Fence      MiscFeature
##  Min.   :  0.00   Min.   :  0.000   Ex  :   2   GdPrv:  59   Gar2:   2  
##  1st Qu.:  0.00   1st Qu.:  0.000   Fa  :   2   GdWo :  54   Othr:   2  
##  Median :  0.00   Median :  0.000   Gd  :   3   MnPrv: 157   Shed:  49  
##  Mean   : 15.06   Mean   :  2.759   NA's:1453   MnWw :  11   TenC:   1  
##  3rd Qu.:  0.00   3rd Qu.:  0.000               NA's :1179   NA's:1406  
##  Max.   :480.00   Max.   :738.000                                       
##                                                                         
##     MiscVal             MoSold           YrSold        SaleType   
##  Min.   :    0.00   Min.   : 1.000   Min.   :2006   WD     :1267  
##  1st Qu.:    0.00   1st Qu.: 5.000   1st Qu.:2007   New    : 122  
##  Median :    0.00   Median : 6.000   Median :2008   COD    :  43  
##  Mean   :   43.49   Mean   : 6.322   Mean   :2008   ConLD  :   9  
##  3rd Qu.:    0.00   3rd Qu.: 8.000   3rd Qu.:2009   ConLI  :   5  
##  Max.   :15500.00   Max.   :12.000   Max.   :2010   ConLw  :   5  
##                                                     (Other):   9  
##  SaleCondition    SalePrice     
##  Abnorml: 101   Min.   : 34900  
##  AdjLand:   4   1st Qu.:129975  
##  Alloca :  12   Median :163000  
##  Family :  20   Mean   :180921  
##  Normal :1198   3rd Qu.:214000  
##  Partial: 125   Max.   :755000  
## 

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

#Clean Train Data

#add new factor level.
train$Alley = factor(train$Alley, levels=c(levels(train$Alley), "No Alley Access"))

train$BsmtQual = factor(train$BsmtQual, levels=c(levels(train$BsmtQual), "No Bsmt"))

train$BsmtCond = factor(train$BsmtCond, levels=c(levels(train$BsmtCond), "No Bsmt"))

train$BsmtExposure = factor(train$BsmtExposure, levels=c(levels(train$BsmtExposure), "No Bsmt"))

train$BsmtFinType1 = factor(train$BsmtFinType1, levels=c(levels(train$BsmtFinType1), "No Bsmt"))

train$BsmtFinType2 = factor(train$BsmtFinType2, levels=c(levels(train$BsmtFinType2), "No Bsmt"))

train$FireplaceQu = factor(train$FireplaceQu, levels=c(levels(train$FireplaceQu), "No Fireplace"))

train$GarageType = factor(train$GarageType, levels=c(levels(train$GarageType), "No Garage"))

train$GarageFinish = factor(train$GarageFinish, levels=c(levels(train$GarageFinish), "No Garage"))

train$GarageQual = factor(train$GarageQual, levels=c(levels(train$GarageQual), "No Garage"))

train$GarageCond = factor(train$GarageCond, levels=c(levels(train$GarageCond), "No Garage"))

train$PoolQC= factor(train$PoolQC, levels=c(levels(train$PoolQC), "No Pool"))

train$Fence= factor(train$Fence, levels=c(levels(train$Fence), "No Fence"))

train$MiscFeature= factor(train$MiscFeature, levels=c(levels(train$MiscFeature), "None"))

#convert all NA's to values
train$Alley[is.na(train$Alley)] = "No Alley Access"

train$BsmtQual[is.na(train$BsmtQual)] = "No Bsmt"

train$BsmtCond[is.na(train$BsmtCond)] = "No Bsmt"

train$BsmtExposure[is.na(train$BsmtExposure)] = "No Bsmt"

train$BsmtFinType1[is.na(train$BsmtFinType1)] = "No Bsmt"

train$BsmtFinType2[is.na(train$BsmtFinType2)] = "No Bsmt"

train$FireplaceQu[is.na(train$FireplaceQu)] = "No Fireplace"

train$GarageType[is.na(train$GarageType)] = "No Garage"

train$GarageFinish[is.na(train$GarageFinish)] = "No Garage"

train$GarageQual[is.na(train$GarageQual)] = "No Garage"

train$GarageCond[is.na(train$GarageCond)] = "No Garage"

train$PoolQC[is.na(train$PoolQC)] = "No Pool"

train$Fence[is.na(train$Fence)] = "No Fence"

train$MiscFeature[is.na(train$MiscFeature)] = "None"

#YearBuilt, change to age, remove the original column
train <- train %>% 
  dplyr::mutate(Age = 2020 - YearBuilt) %>% 
  dplyr::select(-YearBuilt)

#GarageYrBlt change to GarageAge
train <- train %>% 
  dplyr::mutate(GarageAge = 2020 - GarageYrBlt) %>% 
  dplyr::select(-GarageYrBlt)

#YearRemodAdd, change to difference between 2019 and YearRemodAdd
train <- train %>%
  dplyr::mutate(YearRemodAdd = 2020 - YearRemodAdd)
#load test data
test <- read.csv('test.csv') 
summary(test)
##        Id         MSSubClass        MSZoning     LotFrontage    
##  Min.   :1461   Min.   : 20.00   C (all):  15   Min.   : 21.00  
##  1st Qu.:1826   1st Qu.: 20.00   FV     :  74   1st Qu.: 58.00  
##  Median :2190   Median : 50.00   RH     :  10   Median : 67.00  
##  Mean   :2190   Mean   : 57.38   RL     :1114   Mean   : 68.58  
##  3rd Qu.:2554   3rd Qu.: 70.00   RM     : 242   3rd Qu.: 80.00  
##  Max.   :2919   Max.   :190.00   NA's   :   4   Max.   :200.00  
##                                                 NA's   :227     
##     LotArea       Street      Alley      LotShape  LandContour  Utilities   
##  Min.   : 1470   Grvl:   6   Grvl:  70   IR1:484   Bnk:  54    AllPub:1457  
##  1st Qu.: 7391   Pave:1453   Pave:  37   IR2: 35   HLS:  70    NA's  :   2  
##  Median : 9399               NA's:1352   IR3:  6   Low:  24                 
##  Mean   : 9819                           Reg:934   Lvl:1311                 
##  3rd Qu.:11518                                                              
##  Max.   :56600                                                              
##                                                                             
##    LotConfig    LandSlope   Neighborhood   Condition1    Condition2  
##  Corner : 248   Gtl:1396   NAmes  :218   Norm   :1251   Artery:   3  
##  CulDSac:  82   Mod:  60   OldTown:126   Feedr  :  83   Feedr :   7  
##  FR2    :  38   Sev:   3   CollgCr:117   Artery :  44   Norm  :1444  
##  FR3    :  10              Somerst: 96   RRAn   :  24   PosA  :   3  
##  Inside :1081              Edwards: 94   PosN   :  20   PosN  :   2  
##                            NridgHt: 89   RRAe   :  17                
##                            (Other):719   (Other):  20                
##    BldgType     HouseStyle   OverallQual      OverallCond      YearBuilt   
##  1Fam  :1205   1.5Fin:160   Min.   : 1.000   Min.   :1.000   Min.   :1879  
##  2fmCon:  31   1.5Unf:  5   1st Qu.: 5.000   1st Qu.:5.000   1st Qu.:1953  
##  Duplex:  57   1Story:745   Median : 6.000   Median :5.000   Median :1973  
##  Twnhs :  53   2.5Unf: 13   Mean   : 6.079   Mean   :5.554   Mean   :1971  
##  TwnhsE: 113   2Story:427   3rd Qu.: 7.000   3rd Qu.:6.000   3rd Qu.:2001  
##                SFoyer: 46   Max.   :10.000   Max.   :9.000   Max.   :2010  
##                SLvl  : 63                                                  
##   YearRemodAdd    RoofStyle       RoofMatl     Exterior1st   Exterior2nd 
##  Min.   :1950   Flat   :   7   CompShg:1442   VinylSd:510   VinylSd:510  
##  1st Qu.:1963   Gable  :1169   Tar&Grv:  12   MetalSd:230   MetalSd:233  
##  Median :1992   Gambrel:  11   WdShake:   4   HdBoard:220   HdBoard:199  
##  Mean   :1984   Hip    : 265   WdShngl:   1   Wd Sdng:205   Wd Sdng:194  
##  3rd Qu.:2004   Mansard:   4                  Plywood:113   Plywood:128  
##  Max.   :2010   Shed   :   3                  (Other):180   (Other):194  
##                                               NA's   :  1   NA's   :  1  
##    MasVnrType    MasVnrArea     ExterQual ExterCond  Foundation  BsmtQual  
##  BrkCmn : 10   Min.   :   0.0   Ex: 55    Ex:   9   BrkTil:165   Ex  :137  
##  BrkFace:434   1st Qu.:   0.0   Fa: 21    Fa:  39   CBlock:601   Fa  : 53  
##  None   :878   Median :   0.0   Gd:491    Gd: 153   PConc :661   Gd  :591  
##  Stone  :121   Mean   : 100.7   TA:892    Po:   2   Slab  : 25   TA  :634  
##  NA's   : 16   3rd Qu.: 164.0             TA:1256   Stone :  5   NA's: 44  
##                Max.   :1290.0                       Wood  :  2             
##                NA's   :15                                                  
##  BsmtCond    BsmtExposure BsmtFinType1   BsmtFinSF1     BsmtFinType2
##  Fa  :  59   Av  :197     ALQ :209     Min.   :   0.0   ALQ :  33   
##  Gd  :  57   Gd  :142     BLQ :121     1st Qu.:   0.0   BLQ :  35   
##  Po  :   3   Mn  :125     GLQ :431     Median : 350.5   GLQ :  20   
##  TA  :1295   No  :951     LwQ : 80     Mean   : 439.2   LwQ :  41   
##  NA's:  45   NA's: 44     Rec :155     3rd Qu.: 753.5   Rec :  51   
##                           Unf :421     Max.   :4010.0   Unf :1237   
##                           NA's: 42     NA's   :1        NA's:  42   
##    BsmtFinSF2        BsmtUnfSF       TotalBsmtSF   Heating     HeatingQC
##  Min.   :   0.00   Min.   :   0.0   Min.   :   0   GasA:1446   Ex:752   
##  1st Qu.:   0.00   1st Qu.: 219.2   1st Qu.: 784   GasW:   9   Fa: 43   
##  Median :   0.00   Median : 460.0   Median : 988   Grav:   2   Gd:233   
##  Mean   :  52.62   Mean   : 554.3   Mean   :1046   Wall:   2   Po:  2   
##  3rd Qu.:   0.00   3rd Qu.: 797.8   3rd Qu.:1305               TA:429   
##  Max.   :1526.00   Max.   :2140.0   Max.   :5095                        
##  NA's   :1         NA's   :1        NA's   :1                           
##  CentralAir Electrical     X1stFlrSF        X2ndFlrSF     LowQualFinSF     
##  N: 101     FuseA:  94   Min.   : 407.0   Min.   :   0   Min.   :   0.000  
##  Y:1358     FuseF:  23   1st Qu.: 873.5   1st Qu.:   0   1st Qu.:   0.000  
##             FuseP:   5   Median :1079.0   Median :   0   Median :   0.000  
##             SBrkr:1337   Mean   :1156.5   Mean   : 326   Mean   :   3.543  
##                          3rd Qu.:1382.5   3rd Qu.: 676   3rd Qu.:   0.000  
##                          Max.   :5095.0   Max.   :1862   Max.   :1064.000  
##                                                                            
##    GrLivArea     BsmtFullBath     BsmtHalfBath       FullBath    
##  Min.   : 407   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:1118   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.000  
##  Median :1432   Median :0.0000   Median :0.0000   Median :2.000  
##  Mean   :1486   Mean   :0.4345   Mean   :0.0652   Mean   :1.571  
##  3rd Qu.:1721   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:2.000  
##  Max.   :5095   Max.   :3.0000   Max.   :2.0000   Max.   :4.000  
##                 NA's   :2        NA's   :2                       
##     HalfBath       BedroomAbvGr    KitchenAbvGr   KitchenQual  TotRmsAbvGrd   
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Ex  :105    Min.   : 3.000  
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:1.000   Fa  : 31    1st Qu.: 5.000  
##  Median :0.0000   Median :3.000   Median :1.000   Gd  :565    Median : 6.000  
##  Mean   :0.3777   Mean   :2.854   Mean   :1.042   TA  :757    Mean   : 6.385  
##  3rd Qu.:1.0000   3rd Qu.:3.000   3rd Qu.:1.000   NA's:  1    3rd Qu.: 7.000  
##  Max.   :2.0000   Max.   :6.000   Max.   :2.000               Max.   :15.000  
##                                                                               
##    Functional     Fireplaces     FireplaceQu   GarageType   GarageYrBlt  
##  Typ    :1357   Min.   :0.0000   Ex  : 19    2Types : 17   Min.   :1895  
##  Min2   :  36   1st Qu.:0.0000   Fa  : 41    Attchd :853   1st Qu.:1959  
##  Min1   :  34   Median :0.0000   Gd  :364    Basment: 17   Median :1979  
##  Mod    :  20   Mean   :0.5812   Po  : 26    BuiltIn: 98   Mean   :1978  
##  Maj1   :   5   3rd Qu.:1.0000   TA  :279    CarPort:  6   3rd Qu.:2002  
##  (Other):   5   Max.   :4.0000   NA's:730    Detchd :392   Max.   :2207  
##  NA's   :   2                                NA's   : 76   NA's   :78    
##  GarageFinish   GarageCars      GarageArea     GarageQual  GarageCond 
##  Fin :367     Min.   :0.000   Min.   :   0.0   Fa  :  76   Ex  :   1  
##  RFn :389     1st Qu.:1.000   1st Qu.: 318.0   Gd  :  10   Fa  :  39  
##  Unf :625     Median :2.000   Median : 480.0   Po  :   2   Gd  :   6  
##  NA's: 78     Mean   :1.766   Mean   : 472.8   TA  :1293   Po  :   7  
##               3rd Qu.:2.000   3rd Qu.: 576.0   NA's:  78   TA  :1328  
##               Max.   :5.000   Max.   :1488.0               NA's:  78  
##               NA's   :1       NA's   :1                               
##  PavedDrive   WoodDeckSF       OpenPorchSF     EnclosedPorch    
##  N: 126     Min.   :   0.00   Min.   :  0.00   Min.   :   0.00  
##  P:  32     1st Qu.:   0.00   1st Qu.:  0.00   1st Qu.:   0.00  
##  Y:1301     Median :   0.00   Median : 28.00   Median :   0.00  
##             Mean   :  93.17   Mean   : 48.31   Mean   :  24.24  
##             3rd Qu.: 168.00   3rd Qu.: 72.00   3rd Qu.:   0.00  
##             Max.   :1424.00   Max.   :742.00   Max.   :1012.00  
##                                                                 
##    X3SsnPorch       ScreenPorch        PoolArea        PoolQC       Fence     
##  Min.   :  0.000   Min.   :  0.00   Min.   :  0.000   Ex  :   2   GdPrv:  59  
##  1st Qu.:  0.000   1st Qu.:  0.00   1st Qu.:  0.000   Gd  :   1   GdWo :  58  
##  Median :  0.000   Median :  0.00   Median :  0.000   NA's:1456   MnPrv: 172  
##  Mean   :  1.794   Mean   : 17.06   Mean   :  1.744               MnWw :   1  
##  3rd Qu.:  0.000   3rd Qu.:  0.00   3rd Qu.:  0.000               NA's :1169  
##  Max.   :360.000   Max.   :576.00   Max.   :800.000                           
##                                                                               
##  MiscFeature    MiscVal             MoSold           YrSold        SaleType   
##  Gar2:   3   Min.   :    0.00   Min.   : 1.000   Min.   :2006   WD     :1258  
##  Othr:   2   1st Qu.:    0.00   1st Qu.: 4.000   1st Qu.:2007   New    : 117  
##  Shed:  46   Median :    0.00   Median : 6.000   Median :2008   COD    :  44  
##  NA's:1408   Mean   :   58.17   Mean   : 6.104   Mean   :2008   ConLD  :  17  
##              3rd Qu.:    0.00   3rd Qu.: 8.000   3rd Qu.:2009   CWD    :   8  
##              Max.   :17000.00   Max.   :12.000   Max.   :2010   (Other):  14  
##                                                                 NA's   :   1  
##  SaleCondition 
##  Abnorml:  89  
##  AdjLand:   8  
##  Alloca :  12  
##  Family :  26  
##  Normal :1204  
##  Partial: 120  
## 
#Clean Test Data

#add new factor level.
test$Alley = factor(test$Alley, levels=c(levels(test$Alley), "No Alley Access"))

test$BsmtQual = factor(test$BsmtQual, levels=c(levels(test$BsmtQual), "No Bsmt"))

test$BsmtCond = factor(test$BsmtCond, levels=c(levels(test$BsmtCond), "No Bsmt"))

test$BsmtExposure = factor(test$BsmtExposure, levels=c(levels(test$BsmtExposure), "No Bsmt"))

test$BsmtFinType1 = factor(test$BsmtFinType1, levels=c(levels(test$BsmtFinType1), "No Bsmt"))

test$BsmtFinType2 = factor(test$BsmtFinType2, levels=c(levels(test$BsmtFinType2), "No Bsmt"))

test$FireplaceQu = factor(test$FireplaceQu, levels=c(levels(test$FireplaceQu), "No Fireplace"))

test$GarageType = factor(test$GarageType, levels=c(levels(test$GarageType), "No Garage"))

test$GarageFinish = factor(test$GarageFinish, levels=c(levels(test$GarageFinish), "No Garage"))

test$GarageQual = factor(test$GarageQual, levels=c(levels(test$GarageQual), "No Garage"))

test$GarageCond = factor(test$GarageCond, levels=c(levels(test$GarageCond), "No Garage"))

test$PoolQC= factor(test$PoolQC, levels=c(levels(test$PoolQC), "No Pool"))

test$Fence= factor(test$Fence, levels=c(levels(test$Fence), "No Fence"))

test$MiscFeature= factor(test$MiscFeature, levels=c(levels(test$MiscFeature), "None"))

#convert all NA's to values
test$Alley[is.na(test$Alley)] = "No Alley Access"

test$BsmtQual[is.na(test$BsmtQual)] = "No Bsmt"

test$BsmtCond[is.na(test$BsmtCond)] = "No Bsmt"

test$BsmtExposure[is.na(test$BsmtExposure)] = "No Bsmt"

test$BsmtFinType1[is.na(test$BsmtFinType1)] = "No Bsmt"

test$BsmtFinType2[is.na(test$BsmtFinType2)] = "No Bsmt"

test$FireplaceQu[is.na(test$FireplaceQu)] = "No Fireplace"

test$GarageType[is.na(test$GarageType)] = "No Garage"

test$GarageFinish[is.na(test$GarageFinish)] = "No Garage"

test$GarageQual[is.na(test$GarageQual)] = "No Garage"

test$GarageCond[is.na(test$GarageCond)] = "No Garage"

test$PoolQC[is.na(test$PoolQC)] = "No Pool"

test$Fence[is.na(test$Fence)] = "No Fence"

test$MiscFeature[is.na(test$MiscFeature)] = "None"

test$BsmtFinSF2[is.na(test$BsmtFinSF2)] = 0

test$BsmtFullBath[is.na(test$BsmtFullBath)] = 0

test$BsmtHalfBath[is.na(test$BsmtHalfBath)] = 0

test$BsmtUnfSF[is.na(test$BsmtUnfSF)] = 0

test$TotalBsmtSF[is.na(test$TotalBsmtSF)] = 0

test$BsmtFinSF1[is.na(test$BsmtFinSF1)] = 0

test$GarageCars[is.na(test$GarageCars)] = 0

test$GarageArea[is.na(test$GarageArea)] = 0

# GarageYrBlt cannot be > 2020
test %>% 
  filter(GarageYrBlt > 2020) #property id 2593 has GarageYrBlt > 2020, row 1133
##     Id MSSubClass MSZoning LotFrontage LotArea Street           Alley LotShape
## 1 2593         20       RL          68    8298   Pave No Alley Access      IR1
##   LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2
## 1         HLS    AllPub    Inside       Gtl       Timber       Norm       Norm
##   BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle
## 1     1Fam     1Story           8           5      2006         2007       Hip
##   RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond
## 1  CompShg     VinylSd     VinylSd       <NA>         NA        Gd        TA
##   Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1
## 1      PConc       Gd       TA           Av          GLQ        583
##   BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir
## 1          Unf          0       963        1546    GasA        Ex          Y
##   Electrical X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath
## 1      SBrkr      1564         0            0      1564            0
##   BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual
## 1            0        2        0            2            1          Ex
##   TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt
## 1            6        Typ          1          Gd     Attchd        2207
##   GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive
## 1          RFn          2        502         TA         TA          Y
##   WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch PoolArea  PoolQC
## 1        132           0             0          0           0        0 No Pool
##      Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
## 1 No Fence        None       0      9   2007      New       Partial
# Fill it with YearBuilt of that house
test[1133, 'GarageYrBlt'] <- test[1133, 'YearBuilt']

#YearBuilt, change to age, remove the original column
test <- test %>% 
  dplyr::mutate(Age = 2020 - YearBuilt) %>% 
  dplyr::select(-YearBuilt)

#GarageYrBlt change to GarageAge
test <- test %>% 
  dplyr::mutate(GarageAge = 2020 - GarageYrBlt) %>% 
  dplyr::select(-GarageYrBlt)

#YearRemodAdd, change to difference between 2020 and YearRemodAdd
test <- test %>%
  dplyr::mutate(YearRemodAdd = 2020 - YearRemodAdd)
test$FullBath <- factor(test$FullBath)
train$FullBath <- factor(train$FullBath)
levels(test$FullBath)
## [1] "0" "1" "2" "3" "4"
levels(train$FullBath)
## [1] "0" "1" "2" "3"
test$HalfBath <- factor(test$HalfBath)
train$HalfBath <- factor(train$HalfBath)
levels(test$HalfBath)
## [1] "0" "1" "2"
levels(train$HalfBath)
## [1] "0" "1" "2"
test$GarageCars <- factor(test$GarageCars )
train$GarageCars  <- factor(train$GarageCars )
levels(test$GarageCars )
## [1] "0" "1" "2" "3" "4" "5"
levels(train$GarageCars )
## [1] "0" "1" "2" "3" "4"
tr_dummy <- dummyVars(log(SalePrice) ~ ., fullRank = T, data = train) %>% 
    predict(train)

test$SalePrice <- NA

#omit utilities column as it only has one factor variable
test <- test[,-10]

te_dummy <- dummyVars(log(SalePrice) ~ ., fullRank = T, data = test) %>% 
    predict(test)
# Find common columns
(columns <- intersect(names(data.frame(te_dummy)),  names(data.frame(tr_dummy))))
##   [1] "Id"                       "MSSubClass"              
##   [3] "MSZoning.FV"              "MSZoning.RH"             
##   [5] "MSZoning.RL"              "MSZoning.RM"             
##   [7] "LotFrontage"              "LotArea"                 
##   [9] "Street.Pave"              "Alley.Pave"              
##  [11] "Alley.No.Alley.Access"    "LotShape.IR2"            
##  [13] "LotShape.IR3"             "LotShape.Reg"            
##  [15] "LandContour.HLS"          "LandContour.Low"         
##  [17] "LandContour.Lvl"          "LotConfig.CulDSac"       
##  [19] "LotConfig.FR2"            "LotConfig.FR3"           
##  [21] "LotConfig.Inside"         "LandSlope.Mod"           
##  [23] "LandSlope.Sev"            "Neighborhood.Blueste"    
##  [25] "Neighborhood.BrDale"      "Neighborhood.BrkSide"    
##  [27] "Neighborhood.ClearCr"     "Neighborhood.CollgCr"    
##  [29] "Neighborhood.Crawfor"     "Neighborhood.Edwards"    
##  [31] "Neighborhood.Gilbert"     "Neighborhood.IDOTRR"     
##  [33] "Neighborhood.MeadowV"     "Neighborhood.Mitchel"    
##  [35] "Neighborhood.NAmes"       "Neighborhood.NoRidge"    
##  [37] "Neighborhood.NPkVill"     "Neighborhood.NridgHt"    
##  [39] "Neighborhood.NWAmes"      "Neighborhood.OldTown"    
##  [41] "Neighborhood.Sawyer"      "Neighborhood.SawyerW"    
##  [43] "Neighborhood.Somerst"     "Neighborhood.StoneBr"    
##  [45] "Neighborhood.SWISU"       "Neighborhood.Timber"     
##  [47] "Neighborhood.Veenker"     "Condition1.Feedr"        
##  [49] "Condition1.Norm"          "Condition1.PosA"         
##  [51] "Condition1.PosN"          "Condition1.RRAe"         
##  [53] "Condition1.RRAn"          "Condition1.RRNe"         
##  [55] "Condition1.RRNn"          "Condition2.Feedr"        
##  [57] "Condition2.Norm"          "Condition2.PosA"         
##  [59] "Condition2.PosN"          "BldgType.2fmCon"         
##  [61] "BldgType.Duplex"          "BldgType.Twnhs"          
##  [63] "BldgType.TwnhsE"          "HouseStyle.1.5Unf"       
##  [65] "HouseStyle.1Story"        "HouseStyle.2.5Unf"       
##  [67] "HouseStyle.2Story"        "HouseStyle.SFoyer"       
##  [69] "HouseStyle.SLvl"          "OverallQual"             
##  [71] "OverallCond"              "YearRemodAdd"            
##  [73] "RoofStyle.Gable"          "RoofStyle.Gambrel"       
##  [75] "RoofStyle.Hip"            "RoofStyle.Mansard"       
##  [77] "RoofStyle.Shed"           "RoofMatl.Tar.Grv"        
##  [79] "RoofMatl.WdShake"         "RoofMatl.WdShngl"        
##  [81] "Exterior1st.AsphShn"      "Exterior1st.BrkComm"     
##  [83] "Exterior1st.BrkFace"      "Exterior1st.CBlock"      
##  [85] "Exterior1st.CemntBd"      "Exterior1st.HdBoard"     
##  [87] "Exterior1st.MetalSd"      "Exterior1st.Plywood"     
##  [89] "Exterior1st.Stucco"       "Exterior1st.VinylSd"     
##  [91] "Exterior1st.Wd.Sdng"      "Exterior1st.WdShing"     
##  [93] "Exterior2nd.AsphShn"      "Exterior2nd.Brk.Cmn"     
##  [95] "Exterior2nd.BrkFace"      "Exterior2nd.CBlock"      
##  [97] "Exterior2nd.CmentBd"      "Exterior2nd.HdBoard"     
##  [99] "Exterior2nd.ImStucc"      "Exterior2nd.MetalSd"     
## [101] "Exterior2nd.Plywood"      "Exterior2nd.Stone"       
## [103] "Exterior2nd.Stucco"       "Exterior2nd.VinylSd"     
## [105] "Exterior2nd.Wd.Sdng"      "Exterior2nd.Wd.Shng"     
## [107] "MasVnrType.BrkFace"       "MasVnrType.None"         
## [109] "MasVnrType.Stone"         "MasVnrArea"              
## [111] "ExterQual.Fa"             "ExterQual.Gd"            
## [113] "ExterQual.TA"             "ExterCond.Fa"            
## [115] "ExterCond.Gd"             "ExterCond.Po"            
## [117] "ExterCond.TA"             "Foundation.CBlock"       
## [119] "Foundation.PConc"         "Foundation.Slab"         
## [121] "Foundation.Stone"         "Foundation.Wood"         
## [123] "BsmtQual.Fa"              "BsmtQual.Gd"             
## [125] "BsmtQual.TA"              "BsmtQual.No.Bsmt"        
## [127] "BsmtCond.Gd"              "BsmtCond.Po"             
## [129] "BsmtCond.TA"              "BsmtCond.No.Bsmt"        
## [131] "BsmtExposure.Gd"          "BsmtExposure.Mn"         
## [133] "BsmtExposure.No"          "BsmtExposure.No.Bsmt"    
## [135] "BsmtFinType1.BLQ"         "BsmtFinType1.GLQ"        
## [137] "BsmtFinType1.LwQ"         "BsmtFinType1.Rec"        
## [139] "BsmtFinType1.Unf"         "BsmtFinType1.No.Bsmt"    
## [141] "BsmtFinSF1"               "BsmtFinType2.BLQ"        
## [143] "BsmtFinType2.GLQ"         "BsmtFinType2.LwQ"        
## [145] "BsmtFinType2.Rec"         "BsmtFinType2.Unf"        
## [147] "BsmtFinType2.No.Bsmt"     "BsmtFinSF2"              
## [149] "BsmtUnfSF"                "TotalBsmtSF"             
## [151] "Heating.GasW"             "Heating.Grav"            
## [153] "Heating.Wall"             "HeatingQC.Fa"            
## [155] "HeatingQC.Gd"             "HeatingQC.Po"            
## [157] "HeatingQC.TA"             "CentralAir.Y"            
## [159] "Electrical.FuseF"         "Electrical.FuseP"        
## [161] "Electrical.SBrkr"         "X1stFlrSF"               
## [163] "X2ndFlrSF"                "LowQualFinSF"            
## [165] "GrLivArea"                "BsmtFullBath"            
## [167] "BsmtHalfBath"             "FullBath.1"              
## [169] "FullBath.2"               "FullBath.3"              
## [171] "HalfBath.1"               "HalfBath.2"              
## [173] "BedroomAbvGr"             "KitchenAbvGr"            
## [175] "KitchenQual.Fa"           "KitchenQual.Gd"          
## [177] "KitchenQual.TA"           "TotRmsAbvGrd"            
## [179] "Functional.Maj2"          "Functional.Min1"         
## [181] "Functional.Min2"          "Functional.Mod"          
## [183] "Functional.Sev"           "Functional.Typ"          
## [185] "Fireplaces"               "FireplaceQu.Fa"          
## [187] "FireplaceQu.Gd"           "FireplaceQu.Po"          
## [189] "FireplaceQu.TA"           "FireplaceQu.No.Fireplace"
## [191] "GarageType.Attchd"        "GarageType.Basment"      
## [193] "GarageType.BuiltIn"       "GarageType.CarPort"      
## [195] "GarageType.Detchd"        "GarageType.No.Garage"    
## [197] "GarageFinish.RFn"         "GarageFinish.Unf"        
## [199] "GarageFinish.No.Garage"   "GarageCars.1"            
## [201] "GarageCars.2"             "GarageCars.3"            
## [203] "GarageCars.4"             "GarageArea"              
## [205] "GarageQual.Gd"            "GarageQual.Po"           
## [207] "GarageQual.TA"            "GarageQual.No.Garage"    
## [209] "GarageCond.Fa"            "GarageCond.Gd"           
## [211] "GarageCond.Po"            "GarageCond.TA"           
## [213] "GarageCond.No.Garage"     "PavedDrive.P"            
## [215] "PavedDrive.Y"             "WoodDeckSF"              
## [217] "OpenPorchSF"              "EnclosedPorch"           
## [219] "X3SsnPorch"               "ScreenPorch"             
## [221] "PoolArea"                 "PoolQC.Gd"               
## [223] "PoolQC.No.Pool"           "Fence.GdWo"              
## [225] "Fence.MnPrv"              "Fence.MnWw"              
## [227] "Fence.No.Fence"           "MiscFeature.Othr"        
## [229] "MiscFeature.Shed"         "MiscFeature.None"        
## [231] "MiscVal"                  "MoSold"                  
## [233] "YrSold"                   "SaleType.Con"            
## [235] "SaleType.ConLD"           "SaleType.ConLI"          
## [237] "SaleType.ConLw"           "SaleType.CWD"            
## [239] "SaleType.New"             "SaleType.Oth"            
## [241] "SaleType.WD"              "SaleCondition.AdjLand"   
## [243] "SaleCondition.Alloca"     "SaleCondition.Family"    
## [245] "SaleCondition.Normal"     "SaleCondition.Partial"   
## [247] "Age"                      "GarageAge"
# subset based on common columns
te_dummy <- te_dummy %>% 
            data.frame() %>% 
            dplyr::select(columns)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(columns)` instead of `columns` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
tr_dummy <- tr_dummy %>% 
     data.frame() %>% 
     dplyr::select(columns)
# Impute missings on the test set
clean_te_dummy <- preProcess(te_dummy, "medianImpute") %>% 
  predict(te_dummy)

all(complete.cases(clean_te_dummy))
## [1] TRUE
# Impute missings on the train set
clean_tr_dummy <- preProcess(tr_dummy, "medianImpute") %>% 
  predict(tr_dummy)

all(complete.cases(clean_tr_dummy))
## [1] TRUE
# combine clean train and test dummy
all_dummy <- rbind(clean_tr_dummy, clean_te_dummy)

# get variables that might need to be log transformed (skewness > 0.5)
trans <- ((skewness(all_dummy) %>% abs()) > 0.5) %>% which()

# get variables that log transformation improves their skewness
all_dummy_temp <- all_dummy
all_dummy_temp[, trans] <- log(all_dummy[, trans] + 1)
log_index <- (abs(skewness(all_dummy)) - abs(skewness(all_dummy_temp)) > 0.1) %>% which()

# log transform these variables
all_dummy[, log_index] <- log(all_dummy[, log_index] + 1) 

# Additional features need to be transformed (ones not at the same scale as others)
# Need to tranform: BsmtUnfSF, TotalBsmtSF, GarageArea, YrSold
all_dummy <- all_dummy %>% 
  mutate(BsmtUnfSF = log(BsmtUnfSF + 1),
         GarageArea = log(GarageArea + 1),
         YrSold = log(2020 - YrSold + 1),
         TotalBsmtSF = log(TotalBsmtSF + 1))
# split train and test set and look for outliers
train_ready <- all_dummy %>% 
  filter(Id <= 1460)

train_ready['LogSalePrice'] <- log(train$SalePrice)

test_ready <- all_dummy %>% 
  filter(Id > 1460) %>% 
  dplyr::select(-Id)

colnames(train_ready) <- make.names(colnames(train_ready))
colnames(test_ready) <- make.names(colnames(test_ready))
# looking for potential "outliers", plot all the variables against LogSalePrice, 8 plots at a time
i <- 2
while (i <= 249 - 7)
{
  par(mfrow = c(2, 4))
  # par(mar=c(1, 1, 1, 1))

  for (j in i:(i + 7))
  {
    if (j > 248) break
    plot(train_ready[[j]], train_ready$LogSalePrice,
     xlab = train_ready[j] %>% names())
  }
  i <- i + 8
}

# According to these plots, 6 variables may contain some "outliers": OverallCond, LotFrontage, LotArea, X1stFlrSF, GrLivArea, labeld in the figures with Id# 
ggplot(train_ready, aes(OverallCond, LogSalePrice)) + 
  geom_point() +  
  theme_minimal() +
  labs(title = "LogSalePrice & OverallCond") +
  geom_text(aes(label = ifelse(LogSalePrice > 12.5 & OverallCond < 2.5, Id, '')),
            hjust = 1.3)

ggplot(train_ready, aes(LotFrontage, LogSalePrice, label = Id)) + 
  geom_point() +
  theme_minimal() +
  labs(title = "LogSalePrice & LotFrontage") +
  geom_text(aes(label = ifelse(LotFrontage > 5.5, Id, '')),
            hjust = 1.3)

ggplot(train_ready, aes(LotArea, LogSalePrice, label = Id)) + 
  geom_point() +
  theme_minimal() +
  labs(title = "LogSalePrice & LotArea") +
  geom_text(aes(label = ifelse(LotArea > 11.5, Id, '')),
            hjust = 1.3)

ggplot(train_ready, aes(X1stFlrSF, LogSalePrice, label = Id)) + 
  geom_point() +
  theme_minimal() +
  labs(title = "LogSalePrice & X1stFlrSF") +
  geom_text(aes(label = ifelse(X1stFlrSF > 8.25, Id, '')),
            hjust = 1.3)

ggplot(train_ready, aes(GrLivArea, LogSalePrice, label = Id)) + 
  geom_point() +
  theme_minimal() +
  labs(title = "LogSalePrice & GrLivArea") +
  geom_text(aes(label = ifelse(LogSalePrice < 12.5 & GrLivArea > 8.2, Id, '')),
            hjust = 1.3)

## remove these outliners from training set
train_ready <- train_ready %>%
  filter(!Id %in% c(379, 935, 1299, 707, 250, 336, 314, 524)) %>% 
  dplyr::select(-Id)

# now the train and test sets are ready for model fit
dim(train_ready)
## [1] 1452  248
dim(test_ready)
## [1] 1459  247
set.seed(123)

myControl = trainControl(method = "repeatedcv",
                         number = 20,
                         repeats = 10)

train_ready1 <- train_ready %>% 
  dplyr::select(-LogSalePrice) 

model <- train(y = train_ready$LogSalePrice,
               x = train_ready1,
               method = "glmnet",
               preProcess = c("center","scale"),
               trControl = myControl)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, PoolQC.Gd

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, PoolQC.Gd

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, PoolQC.Gd
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, Electrical.FuseP

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, Electrical.FuseP

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, Electrical.FuseP
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: RoofStyle.Shed, Exterior1st.BrkComm

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: RoofStyle.Shed, Exterior1st.BrkComm

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: RoofStyle.Shed, Exterior1st.BrkComm
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN,
## PoolQC.Gd

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN,
## PoolQC.Gd

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN,
## PoolQC.Gd
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition1.RRNe

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition1.RRNe

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition1.RRNe
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, ExterCond.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA, Condition2.PosN
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock, ExterCond.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.AsphShn,
## Exterior1st.CBlock, Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.AsphShn,
## Exterior1st.CBlock, Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.AsphShn,
## Exterior1st.CBlock, Exterior2nd.CBlock
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po, Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po, Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po, Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN, Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosA
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: HeatingQC.Po
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Exterior1st.AsphShn
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Condition2.PosN
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19,
## uniqueCut = 10, : These variables have zero variances: Exterior1st.CBlock,
## Exterior2nd.CBlock
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Functional.Sev
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: ExterCond.Po
model
## glmnet 
## 
## 1452 samples
##  247 predictor
## 
## Pre-processing: centered (247), scaled (247) 
## Resampling: Cross-Validated (20 fold, repeated 10 times) 
## Summary of sample sizes: 1379, 1380, 1380, 1379, 1379, 1380, ... 
## Resampling results across tuning parameters:
## 
##   alpha  lambda        RMSE       Rsquared   MAE       
##   0.10   0.0006555019  0.1153852  0.9169069  0.07967681
##   0.10   0.0065550192  0.1139768  0.9188340  0.07932931
##   0.10   0.0655501923  0.1186729  0.9150325  0.08352369
##   0.55   0.0006555019  0.1135127  0.9194134  0.07869450
##   0.55   0.0065550192  0.1137109  0.9193739  0.07980304
##   0.55   0.0655501923  0.1524831  0.8760187  0.10738070
##   1.00   0.0006555019  0.1131413  0.9199141  0.07847729
##   1.00   0.0065550192  0.1159509  0.9166443  0.08144864
##   1.00   0.0655501923  0.1825345  0.8322606  0.13267675
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.0006555019.
# in-sample RMSE
rmse <- function(actual, fitted) sqrt(mean((actual - fitted)^2))
rmse(train_ready$LogSalePrice, fitted(model))
## [1] 0.0931256
# in-sample R-square

y_predicted <- predict(model, s = opt_lambda, newx = x)

# Sum of Squares Total and Error
sst <- sum((train_ready$LogSalePrice - mean(train_ready$LogSalePrice))^2)
sse <- sum((y_predicted - train_ready$LogSalePrice)^2)

# R squared
rsq <- 1 - sse / sst
rsq
## [1] 0.945377
varImp(model)
## glmnet variable importance
## 
##   only 20 most important variables shown (out of 247)
## 
##                      Overall
## GrLivArea             100.00
## MSZoning.RL            69.99
## Age                    56.42
## MSZoning.RM            52.19
## OverallQual            46.63
## MSZoning.FV            40.45
## TotalBsmtSF            37.97
## OverallCond            36.89
## LotArea                32.83
## KitchenQual.Gd         24.35
## BsmtFinSF1             23.85
## KitchenQual.TA         22.16
## X1stFlrSF              21.08
## SaleType.New           19.26
## MSZoning.RH            18.37
## SaleCondition.Normal   18.35
## GarageArea             17.23
## Neighborhood.Crawfor   16.99
## Condition1.Norm        16.35
## BsmtQual.TA            14.52
#Coeficient
coef(model$finalModel, model$finalModel$tuneValue$lambda)
## 248 x 1 sparse Matrix of class "dgCMatrix"
##                                      1
## (Intercept)               1.202162e+01
## MSSubClass               -3.991044e-03
## MSZoning.FV               5.113478e-02
## MSZoning.RH               2.321794e-02
## MSZoning.RL               8.846763e-02
## MSZoning.RM               6.597100e-02
## LotFrontage               5.868957e-03
## LotArea                   4.150094e-02
## Street.Pave               6.708622e-03
## Alley.Pave                3.041393e-03
## Alley.No.Alley.Access     .           
## LotShape.IR2              1.278005e-03
## LotShape.IR3              .           
## LotShape.Reg              1.405075e-03
## LandContour.HLS           1.551400e-04
## LandContour.Low          -2.697715e-03
## LandContour.Lvl           5.586666e-04
## LotConfig.CulDSac         4.979968e-03
## LotConfig.FR2            -5.907982e-03
## LotConfig.FR3            -2.784848e-03
## LotConfig.Inside         -4.837638e-03
## LandSlope.Mod             2.881985e-03
## LandSlope.Sev            -6.271000e-03
## Neighborhood.Blueste      7.741745e-04
## Neighborhood.BrDale       1.774698e-03
## Neighborhood.BrkSide      8.630767e-03
## Neighborhood.ClearCr      5.878983e-03
## Neighborhood.CollgCr      .           
## Neighborhood.Crawfor      2.148244e-02
## Neighborhood.Edwards     -1.248026e-02
## Neighborhood.Gilbert     -2.098065e-03
## Neighborhood.IDOTRR      -4.428967e-03
## Neighborhood.MeadowV     -5.588257e-03
## Neighborhood.Mitchel     -7.152946e-03
## Neighborhood.NAmes       -2.567543e-04
## Neighborhood.NoRidge      1.521512e-02
## Neighborhood.NPkVill      3.484298e-03
## Neighborhood.NridgHt      1.356846e-02
## Neighborhood.NWAmes      -3.365189e-03
## Neighborhood.OldTown     -5.202152e-03
## Neighborhood.Sawyer       .           
## Neighborhood.SawyerW      2.271896e-04
## Neighborhood.Somerst      4.816009e-03
## Neighborhood.StoneBr      1.584397e-02
## Neighborhood.SWISU        1.555459e-03
## Neighborhood.Timber       .           
## Neighborhood.Veenker      2.035590e-03
## Condition1.Feedr          3.099270e-03
## Condition1.Norm           2.067201e-02
## Condition1.PosA           1.189702e-04
## Condition1.PosN           7.819709e-03
## Condition1.RRAe          -6.249307e-03
## Condition1.RRAn           2.200276e-03
## Condition1.RRNe           .           
## Condition1.RRNn           4.852167e-03
## Condition2.Feedr          6.999054e-04
## Condition2.Norm           1.682175e-03
## Condition2.PosA           4.868347e-03
## Condition2.PosN          -1.434482e-03
## BldgType.2fmCon           .           
## BldgType.Duplex          -1.313686e-03
## BldgType.Twnhs           -1.340000e-03
## BldgType.TwnhsE           .           
## HouseStyle.1.5Unf         4.073782e-03
## HouseStyle.1Story         .           
## HouseStyle.2.5Unf         .           
## HouseStyle.2Story         .           
## HouseStyle.SFoyer         3.219409e-03
## HouseStyle.SLvl          -7.727582e-06
## OverallQual               5.893863e-02
## OverallCond               4.663728e-02
## YearRemodAdd             -1.226847e-02
## RoofStyle.Gable          -2.047457e-03
## RoofStyle.Gambrel        -7.821124e-04
## RoofStyle.Hip             .           
## RoofStyle.Mansard         2.468075e-03
## RoofStyle.Shed            2.279823e-03
## RoofMatl.Tar.Grv         -9.137340e-05
## RoofMatl.WdShake          3.242515e-04
## RoofMatl.WdShngl          8.900862e-03
## Exterior1st.AsphShn      -1.257890e-03
## Exterior1st.BrkComm      -5.960156e-03
## Exterior1st.BrkFace       1.664689e-02
## Exterior1st.CBlock       -2.675509e-03
## Exterior1st.CemntBd       .           
## Exterior1st.HdBoard      -1.097829e-03
## Exterior1st.MetalSd       2.591397e-03
## Exterior1st.Plywood       .           
## Exterior1st.Stucco        1.258306e-03
## Exterior1st.VinylSd       .           
## Exterior1st.Wd.Sdng      -5.511834e-03
## Exterior1st.WdShing       .           
## Exterior2nd.AsphShn       .           
## Exterior2nd.Brk.Cmn       .           
## Exterior2nd.BrkFace      -7.227777e-03
## Exterior2nd.CBlock       -1.051713e-05
## Exterior2nd.CmentBd       2.186333e-03
## Exterior2nd.HdBoard       .           
## Exterior2nd.ImStucc       1.086262e-03
## Exterior2nd.MetalSd       .           
## Exterior2nd.Plywood      -1.990209e-03
## Exterior2nd.Stone        -2.918144e-03
## Exterior2nd.Stucco        .           
## Exterior2nd.VinylSd       .           
## Exterior2nd.Wd.Sdng       1.416899e-03
## Exterior2nd.Wd.Shng      -7.764283e-05
## MasVnrType.BrkFace        2.586814e-04
## MasVnrType.None           .           
## MasVnrType.Stone          2.824442e-03
## MasVnrArea                .           
## ExterQual.Fa              2.960055e-03
## ExterQual.Gd              .           
## ExterQual.TA             -2.472109e-03
## ExterCond.Fa             -4.425156e-03
## ExterCond.Gd             -3.568844e-03
## ExterCond.Po             -1.325599e-03
## ExterCond.TA              .           
## Foundation.CBlock         6.942063e-03
## Foundation.PConc          1.318534e-02
## Foundation.Slab           8.606037e-04
## Foundation.Stone          4.131372e-03
## Foundation.Wood          -3.764061e-03
## BsmtQual.Fa              -3.774962e-03
## BsmtQual.Gd              -1.637289e-02
## BsmtQual.TA              -1.835797e-02
## BsmtQual.No.Bsmt          1.406334e-02
## BsmtCond.Gd               1.711089e-04
## BsmtCond.Po               4.216582e-04
## BsmtCond.TA               4.055845e-03
## BsmtCond.No.Bsmt          3.001244e-04
## BsmtExposure.Gd           1.526351e-02
## BsmtExposure.Mn           .           
## BsmtExposure.No          -4.382063e-03
## BsmtExposure.No.Bsmt      .           
## BsmtFinType1.BLQ         -4.002062e-03
## BsmtFinType1.GLQ          6.518093e-04
## BsmtFinType1.LwQ         -5.124648e-03
## BsmtFinType1.Rec         -4.470197e-03
## BsmtFinType1.Unf          .           
## BsmtFinType1.No.Bsmt      7.304471e-05
## BsmtFinSF1                3.015157e-02
## BsmtFinType2.BLQ         -4.454012e-03
## BsmtFinType2.GLQ          3.839595e-03
## BsmtFinType2.LwQ         -1.444193e-03
## BsmtFinType2.Rec         -1.917834e-03
## BsmtFinType2.Unf         -1.533640e-03
## BsmtFinType2.No.Bsmt      1.143727e-02
## BsmtFinSF2                .           
## BsmtUnfSF                -7.913153e-03
## TotalBsmtSF               4.800037e-02
## Heating.GasW              4.788652e-03
## Heating.Grav             -1.026162e-02
## Heating.Wall              4.766313e-04
## HeatingQC.Fa             -1.580712e-03
## HeatingQC.Gd             -5.848911e-03
## HeatingQC.Po              .           
## HeatingQC.TA             -1.345561e-02
## CentralAir.Y              1.342673e-02
## Electrical.FuseF          1.375978e-03
## Electrical.FuseP         -1.493356e-03
## Electrical.SBrkr         -1.380602e-03
## X1stFlrSF                 2.664419e-02
## X2ndFlrSF                 .           
## LowQualFinSF             -2.863179e-03
## GrLivArea                 1.264087e-01
## BsmtFullBath              1.245455e-02
## BsmtHalfBath              3.167602e-04
## FullBath.1                .           
## FullBath.2                5.401792e-03
## FullBath.3                1.326544e-02
## HalfBath.1                1.170164e-02
## HalfBath.2               -5.401237e-04
## BedroomAbvGr             -2.609019e-03
## KitchenAbvGr             -9.919043e-03
## KitchenQual.Fa           -6.875051e-03
## KitchenQual.Gd           -3.077885e-02
## KitchenQual.TA           -2.801235e-02
## TotRmsAbvGrd              .           
## Functional.Maj2          -1.136221e-02
## Functional.Min1           .           
## Functional.Min2           .           
## Functional.Mod           -2.346099e-03
## Functional.Sev           -6.186916e-03
## Functional.Typ            1.611680e-02
## Fireplaces                1.468716e-02
## FireplaceQu.Fa           -1.996970e-03
## FireplaceQu.Gd            .           
## FireplaceQu.Po            1.867342e-07
## FireplaceQu.TA            .           
## FireplaceQu.No.Fireplace  .           
## GarageType.Attchd         .           
## GarageType.Basment       -3.938209e-03
## GarageType.BuiltIn        3.254108e-04
## GarageType.CarPort       -4.083287e-03
## GarageType.Detchd         3.969795e-03
## GarageType.No.Garage      .           
## GarageFinish.RFn          .           
## GarageFinish.Unf         -8.560804e-04
## GarageFinish.No.Garage    .           
## GarageCars.1             -6.606109e-03
## GarageCars.2              .           
## GarageCars.3              1.553919e-02
## GarageCars.4              7.914046e-03
## GarageArea                2.177616e-02
## GarageQual.Gd             1.841943e-03
## GarageQual.Po             1.953065e-05
## GarageQual.TA             7.025446e-05
## GarageQual.No.Garage      .           
## GarageCond.Fa            -8.502955e-03
## GarageCond.Gd             2.078640e-05
## GarageCond.Po             9.254211e-04
## GarageCond.TA             .           
## GarageCond.No.Garage      .           
## PavedDrive.P             -1.215132e-03
## PavedDrive.Y              3.279317e-03
## WoodDeckSF                8.051286e-03
## OpenPorchSF               2.610796e-03
## EnclosedPorch             2.217023e-03
## X3SsnPorch                1.335480e-03
## ScreenPorch               1.059326e-02
## PoolArea                  5.044351e-03
## PoolQC.Gd                 5.878457e-03
## PoolQC.No.Pool            .           
## Fence.GdWo               -5.222105e-03
## Fence.MnPrv               2.403366e-04
## Fence.MnWw               -1.672652e-03
## Fence.No.Fence            .           
## MiscFeature.Othr         -2.908653e-03
## MiscFeature.Shed          .           
## MiscFeature.None          .           
## MiscVal                  -1.273056e-03
## MoSold                   -8.850514e-04
## YrSold                    2.579694e-03
## SaleType.Con              2.640037e-03
## SaleType.ConLD            5.596314e-03
## SaleType.ConLI            .           
## SaleType.ConLw            3.446600e-04
## SaleType.CWD              2.836195e-03
## SaleType.New              2.434819e-02
## SaleType.Oth              2.841873e-03
## SaleType.WD              -1.321595e-03
## SaleCondition.AdjLand     4.078395e-03
## SaleCondition.Alloca      1.132249e-03
## SaleCondition.Family      .           
## SaleCondition.Normal      2.319710e-02
## SaleCondition.Partial     .           
## Age                      -7.131935e-02
## GarageAge                -2.961980e-03
predictions <- predict(model, test_ready)
head(predictions)
## [1] 11.69230 11.97605 12.14106 12.19673 12.19166 12.04807
price_prediction <- data.frame(Id = test$Id,
                                 SalePrice = exp(predictions))
head(price_prediction)
##     Id SalePrice
## 1 1461  119647.1
## 2 1462  158902.4
## 3 1463  187411.2
## 4 1464  198139.3
## 5 1465  197139.0
## 6 1466  170768.7
all(complete.cases(price_prediction))
## [1] TRUE
# Export your prediction data.frame as a .csv file.
write.csv(price_prediction, "price_prediction_0424.csv")